84 


Edith J. CISNEROS-COHERNOUR 

Yucatan autonominis universitetas • Universidad Autonoma de Yucatan 


DESTYMO AUKSTOJOJE 
MOKYKLOJE PAGRISTUMAS 
IR IVERTINIMAI PAGAL 
POZITYVISTIN^ PARADIGMA 


VALIDITY AND EVALUATIONS 
OF TEACHING IN HIGHER 
EDUCATION INSTITUTIONS 
UNDER POSITIVISTIC PARADIGM 


SANTRAUKA 

Straipsnyje nagrinejamas tyrimq, atliktq pagal vyraujanci^ des- 
tymo aukstojoje mokykioje vertinimo paradigm^, pagristumas 
(validumas). Siekiant nustatyti tyrimq, kuriuose analizuojami 
studentq sudaryti destymo reitingai, privalumus ir trukumus, 
taikoma Messick pagrjstumo sistema. Taip pat keliamos pro- 
blemos, j kurias reiketq atkreipti demesj atliekant tokius tyri- 
mus ateityje. 

PAGRINDINig SAVOKg APIBREZIMAI 

• Pagrjstumas (tinkamumas) - s\e\as\ su interpretacijos pras- 
mingumu, verte irtinkamumu. Jis parodo, kokiu mastu empi- 
riniai duomenys ir teorija atspindi vertinimo arba ivertinimo 
adekvatumat ir tinkamum^ (Messick, 1995, p. 741). 

• Fakutteto darbuotojLj jvertinimas -auVsU\\\Ji mokykiq (kole- 
gijq ir universitetq) administracinio, mokomojo ar kito akade- 
minio personalo kompetencijos ivertinimas, remiantis apibrez- 
tais kriterijais. 

• Destymas koiegijose- procesas, kurio metu satmoningai per- 
teikiamos zinios, poziuriai ar jgudziai. Sqvoka apima visq auks- 
tojo mokslo institucijq studijq process nuo planavimo iki jgy- 
vendinimo atsizvelgiant ir j griztamayi rysj. 

• Aukstasis mokstas - bet koks aukstesnis uz vidurinj (1 2 kla- 
siq) issilavinimas, kurj baigus jgyjamas kvalifikacinis laipsnis. 

• jvertinimas- kieno nors privalumq, gerqjq savybiq ar vertin- 
gumo nustatymas arba sio proceso rezultatas. 

• Vertinimas- daznai vartojamas kaip sinonimas ivertinimo s^- 
vokai apibudinti, bet kai kuriais atvejais gali apibudinti process, 
kuriame labiau akcentuojami kiekybiniai ir/artyrimotikslai. 

• Destymas - instruktavimo sinonimas. 

• tnstruktavimas-K\V&\\'nQ'as ziniq, poziuriq ar jgudziq perteikimo 
procesas; apima vis^ instruktavimo process pradedant planavi- 
mu ir igyvendinimu bei jvertinimu ir baigiant grjztamuoju rysiu. 

• Patikimumas (tiksiumas) - tarn tikrq matavimq ar bandymq 
rezultatq pastovumas arba stabilumas. Kai pakartotiniai to pa- 
ties reiskinio matavimai duoda tuos pacius arba panasius re- 
zultatus, teigiama, kad matavimo priemone yra patikima. 

• Ivertinimo instrumentas - priemone, skirta nustatyti kieno 
nors privalumus, jo gerqsias savybes arba vertingumq. 

• Vertinimo instrumentas -apibudina priemonp, naudojama 
jvertinimui. 

• Apibendrinamumas- isvadq, kurias galima padaryti apie popu- 
liacijct remiantis turima informacija apie bandinj, apimties mastas. 


ABSTRACT 

This paper focuses on the validity of the research conducted under 
the leading paradigm in the evaluation of teaching in higher 
education. Messick’s framework on validity is used to identify the 
strengths and limitations of the research, mostly centered on the 
study of student ratings of instruction. Critical issues that need to 
be addressed by future studies in this area are also identified. 

DEFINITIONS OF KEY WORDS 

• Validity- linked to the meaning, value and the appropriateness 
of interpretation. It is the overall judgment of the extent of which 
empirical evidence and theory support the adequacy and 
appropriateness of the interpretations based on the assessment 
or evaluation (Messick, 1995, p. 741). 

• Facuity evaluation - judging the value or competence of 
administrative, instructional, or other academic staff in higher 
schools (colleges, universities) based on established criteria. 

• College teaching - refers to the process by which knowledge, 
attitudes, or skills are deliberately conveyed - includes the total 
instructional process, from planning and implementation through 
evaluation and feedback that takes place in higher education 
institutions. 

• Higher education - all education beyond the secondary level 
(12"" grade), leading to a formal degree. 

• Evaluation -Vne process of determining the merit, worth or value 
of something; or the product of that process (Scriven, 1991). 

• Assessment - offer used as a synonym for evaluation, but 
sometimes used to refer to a process that is more focussed on 
quantitative and /or testing approaches. (Scriven, 1991). 

• Teac/7//75'- synonym for instruction. 

• Instruction - process by which knowledge, attitudes, or skills are 
deliberately conveyed — includes the total instructional process, 
from planning and implementation through evaluation and feedback. 

• Reliability - the consistency or stability of a measure or test 
from one use to the next. When repeated measures of the same 
thing give identical or very similar results, the measurement 
instrument is said to be reliable. (Vogt, 1993) 

• Evaluation instrument- intrument used for determining the merit, 
worth or value of something. 

• Assessment instrument - it refers to the instrument used for 
evaluation. 

• Generalizibility- the extent to which we can come to conclusions 
about population based on information about a sample. 
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IVADAS 

Tradiciniam pozityvistiniam destymo aukstoje mokykioje 
jvertinimo poziuriui budingas praktika pagrjstq matavimL! 
objektyvumo pabrezimas. Tyrejas arba jvertintojas yra „ob- 
jektyvus" informacijos rinkejas, kuris remiasi kiekybines 
analizes metodais. Erickson (1986) teigia, kad vyraujanti 
destymo tyrimq paradigma yra kilusi is tradicinio gamtos 
moksiLj modelio: 

„Pastaruosius 20 metij pozityvistiniai tyrimai daugiau 
demesio skyre analitiniams procesams, o ne teoriniq mo- 
deiiq tobuiinimui. Manyta, kad bendrumai tarp skirtingq 
auditorijq isryskes atiiekant tyrimus, o nezymius skirtu- 
mus galima atmesti kaip nereiksmingus." (1986, p. 131). 

Siam modeiiui pritariantys moksiininkai destymq sie- 
ja su eigsena, o jvertinim^ su efektyvumu. Taigi, destymo 
efektyvumas yra ..nustatomas pagai galutinius jvertinimus, 
standartizuotus pasiekimij testus ir konkreci^ destymo 
praktik^" (Erickson 1986, p. 131). 

Pozityvizmo modelio pavyzdys yra proceso kaip pro- 
dukto tyrimai, kuriq metu pabreziamas „tiesioginis“ des- 
tymas, siekiamq ziniq ir eigsenos pateikimas ar atkarto- 
jimas. Tokiuose tyrimuose destymo efektyvumu laikomi 
„atskiri stebimi paties destymo proceso deriniai, veiks- 
mingi nepriklausomai nuo laiko ar vietos" (Shulman 
1986, p. 10). 

Kaip teigia Dunkin ir Barnes (1986), proceso kaip pro- 
dukto tyrimai, atlikti septintame desimtmetyje ir astuntojo 
desimtmecio pradzioje, yra dabartinio destymo auksto- 
siose mokykiose pagrindas. Taciau, skirtingai nei daugu- 
ma tokio pobudzio tyrimij, atliktij kitose svietimo pako- 
pose, aukstojo mokslo lygmenyje „proceso analize buvo 
apibreziama preskriptyviai arba vertinama nepakankamai 
parengtq stebetojq, taciau nebuvo grindziama atidziu ste- 
bejimu “ (Dunkin, Barnes, 1986, p. 774). 

Kiti tyrinetojai pazymi, kad formali gero destymo 
samprata aukstojoje mokykioje atsirado ne is proceso 
kaip produkto tyrimq, o is ger^ destym^ apibrezianciq 
charakteristikij ir savybig s^raso. Tokie s^rasai sudary- 
ti remiantis destytojg ir studentg apkiausomis, kurig me- 
tu respondentai turejo apibudinti, kas apima „gero 
destymo" samprat^. Feldman (1988), Frey (1979) ir 
Marsh (1997) pritaria tokiam charakteristikg ar elgesio 
apibudinimo panaudojimui kuriant destymo kokybes 
nustatymo metodik^. Jie teigia, kad apkiausos gali ga- 
rantuoti destytojams grjztam^j rysj, galimyb^ issiais- 
kinti, kaip vertinamas jg destymas, nustatyti poreikius 
ir trukumus. 

Dar viena mokslininkg grupe teigia, kad destymas 
turetg buti vertinamas ne pagai tarn tikras jo savybes ar 
dimensijas, o „globaliai“. Cashin ir Downey (1992), Co- 
hen (1986) irAbrami (1990; 1993) pabreziaglobaliusele- 
mentus arba „atidziai nustatom^ faktoriaus taskg vidur- 
kj“, kai destytojg reitingavimas s^lygoja administracinius 
sprendimus (Abrami 1990, p. 98). Abrami nuomone, nors 
„geras destymas" susideda is daugybes komponentg, 
jj reiketg vertinti „globaliai", lyginant jvairig kursg, fakul- 
tetg ir jvairiomis aplinkybemis dirbancius destytojus. 
Mokslininkas abejoja konstrukto, kuris neapima visg ge- 
ro destymo dimensijg ir charakteristikg jvertinimo, vali- 
dumu. 

Tradiciniame jvertinime, kuris grindziamas pozityvizmo 
modeliu, pabreziama, kad labai svarbu nustatyti ir apiben- 


INTRODUCTION 

The traditional positivistic approach for evaluating teach- 
ing in higher education has been characterized by a strong 
emphasis on objectivity in measurement that excludes at- 
tention to values behind the practice. The researcher or 
evaluator is an “objective” data gatherer who strongly re- 
lies on quantitative methods. According to Erickson (1 986), 
the mainstream paradigm for research on teaching has its 
roots in the traditional model of the natural sciences: 

“The history of the positivistic research on teaching 
for the past 20 years is one of analytical bootstrapping 
with very partial theoretical models of the teaching proc- 
ess, on the assumptions that what was generic across 
classrooms would emerge across studies and that the 
subtle variations across classrooms were trivial and could 
be washed out of the analysis as error variance.” (p. 131). 

Researchers following this paradigm tend to link the 
idea of teaching to the idea of treatment, and evaluation 
to the idea of effectiveness. Teaching effectiveness, then, 
is “measured by looking at end-of-the-year scores or 
standardized achievement tests, and to particular teach- 
ing practices.” (Erickson 1986, p. 131). 

A clear example of this paradigm is the process-prod- 
uct research that strongly supports “direct” instruction, 
the presentation and recitation of desired knowledge and 
behaviors. In this research, the effectiveness of teaching 
is “attributable to combinations of discrete and observ- 
able teaching performances per se, operating relatively 
independent of time and space.” (Shulman 1986, p. 10). 

According to Dunkin and Barnes (1986), process-prod- 
uct research of the 60’s and early 70’s is the underlying 
rationale for teaching in higher education today. But unlike 
most of this research (conducted at other levels of educa- 
tion), in higher education “the process part has been as- 
sumed on the basis of prescriptive definitions, or rated by 
untrained observers, rather than documented through care- 
ful observation.” (Dunkin & Barnes, p. 774). 

Other researchers point out that the formal conception 
of good teaching in higher education has not resulted from 
process-product research but from lists of characteristics 
or qualities that are used as descriptors of good teaching. 
Some of these lists are the result of surveys to faculty mem- 
bers and students who have been asked to describe what 
constitutes “good teaching.” Feldman (1988), Frey (1979) 
and Marsh (1997) support the use of these characteristics 
or behaviors for designing instruments for assessing teach- 
ing quality. They say that including multiple dimensions can 
produce useful information as feedback for faculty about 
their teaching, and for identifying faculty needs for improve- 
ment of instruction. 

Another group of researchers supports a different point 
of view. These researchers claim that teaching should be 
evaluated “globally” rather than paying attention to particu- 
lar characteristics or dimensions of instruction. Cashin and 
Downey (1992), Cohen (1986) and Abrami et al (1990; 
1993), are among the researchers who support the use of 
global items or a “carefully weighted average of the factor 
scores” when the ratings are used for making administra- 
tive decisions (Abrami et al. 1990, p. 98). According to 
Abrami, even though “good teaching” is a construct of 
multiple components, it is more appropriate to evaluate tea- 
ching “globally” when comparing instructors across cour- 
ses, departments and settings. He expresses concern about 
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drinti priezasties ir pasekmes s^sajas. Dazniausiai jverti- 
nimui naudojamos reitingo skales, pusiau strukturines 
apkiausos, klausimynai irtestai (Feldman, 1986, 1986; 
Falk, 1971). Vis delto dazniauslos destymo ivertinimo 
priemones kolegljose yra klausimynai. Paprastal j klau- 
slmynq jtrauklami globalus ir/ arba standartizuoti klausi- 
mai apie destymo charakteristikas ir dimensijas. jvertini- 
mo formi! administravimas taip pat standartizuojamas. 
Analizuojant tokiq apklausq rezultatus, tyrimi! duome- 
nys verciami taskais ar skaiciais. Po to rezultatai gauti is 
kitq ivertinimq lyginami su duomenimis, gautais is kitq 
fakulteto atstovq, arba sugretinami su is anksto nustaty- 
tais kriterijais ir standartais. Atliekant netgi darbo audito- 
rijoje stebejimus remiamasi kiekybiniais vertinimais. 

Destymo jvertinimo pagrjstumo tyrimuose auksto- 
siose mokykiose taip pat analizuojamas studenti! at- 
sakymq pagrjstumas. Nors tyrimq apie tai, kaip stu- 
dentai vertina destym^, gana nemazai, taciau jie daz- 
niausiai daugiasekcijiniai ar daugybiniai, tad reikalin- 
ga tikslesne (tyrimq) pagrjstumo analize pagal nau]^ 
Messik (1989) pagrjstumo modelj. Messik (1989) mo- 
delis ne tik leme naujas tyrinejimq kryptis, bet ir s^ly- 
gojo naujL! studentq vertinimo standartq sukurimq 
(AERA, APA ir NOME, 1999). 


1 

PAGRISTUMAS IR PAPLIT^ DESTYMO 
IVERTINIMAI 

Pagrjstumo s^vokoje slypi du klausimai: „Ar matuoja- 
me tikrai tai, k^ norime ismatuoti? Ar jzvalgos ir veiks- 
mai apie jvertinamuosius' yra pagrjsti faktais?" Ka- 
dangi pagrjstumas apima interpretacijos prasmin- 
gumo, reiksmingumo ir tinkamumo s^vokas, jis yra 
svarbiausias destymo jvertinimo faktorius. Tai sutam- 
pa ir su naujais pagrjstumo teorijos atradimais. 

Devintojo desimtmecio pabaigoje ir desimtojo pra- 
dzioje, vertinimo literature pasipilde naujais tyrimais, 
kurie leme naujos pagrjstumo sampratos atsiradimq. 
Slq naujoviskq tyrlmq pradininkas buvo Samuelis Mes- 
sick (1989), paras^s stralpsnj apie pagrjstum^. She- 
pard (1993), Lane, Park ir Stone (1998); Moss (1992; 
1998), Reckase (1998); Yen (1998) irCronbach (1989) 
t^se Messick tyrinejimi! kryptj. Naujasis modelis at- 
meta fragmentisk^ ir pateikia vientis^ pagrjstumo sam- 
prat^. Pagal sj modelj pagrjstumas yra konstruktyvus 
relskinys. Kaip teigia Messick, naujasis modelis „jun- 
gia turinj, kriterijus ir pasekmes j konstruktyvi^ visu- 
m^, leidzianci^ empiriskai patikrinti racionalias hipo- 
tezes apie skaiciq reiksm^ ir teorinius taikomojo ir 
mokslinio pobudzio santykius" (Messick 1995, p. 751). 

Be to, pagrjstumas yra ne testo ypatumas, o „vi- 
sapusis sprendimas, kuriame nustatoma, kaip, kokiais 
mastais jvertinimo interpretacijq adekvatumas ir tin- 
kamumas paremtas empiriniais duomenimis ir teori- 
ja“ (Messick 1995, p. 741). Pagrjstumas apima ne tik 
vertinimo rezultatq prasmingum^ir interpretacijas, bet 
ir isvadas bei socialines jvertinimo pasekmes. Taigi 
galima teigti, kad pagrjstumas remiasi prasmingumu 
ir rezultatais (Messick, 1989, 1995). 

' jvertinimo objektas. 


construct validity problems that could result if the evaluation is 
unable to Include all relevant dimensions and characteristics 
of good teaching. 

Traditional evaluation under a positivistic paradigm puts 
strong emphasis on generalization, and on the establishment 
of cause and effect linkages. In most cases, the evaluation is 
conducted by using rating scales, seml-structured Interviews, 
personality tests or questionnaires (Feldman, 1986, 1989; Falk, 
1971). Questionnaires, however, are the most used instru- 
ments in the evaluation of college Instruction by the resear- 
chers. In most cases, the questionnaires include global Items 
and/or pre-ordInate standardized sets of Items about teaching 
characteristics and dimensions. The evaluation forms are 
administered in standardized way. Findings of these surveys 
are commonly analyzed in a way that reduces the results to a 
rating or score. Then, results obtained from the evaluation 
are compared with those obtained by other faculty members 
or against a predetermined criterion or standard. When class- 
room observations are conducted, the tendency again is to- 
wards quantification. 

Studies on the validity of the evaluation of teaching in higher 
education have also centered on the validity of student ratings. 
Although there Is a broad number of studies about the validity of 
student ratings of instruction, mainly as multi-section and multi- 
trait studies, there is a need for examining the validity of the re- 
search under the new validity framework developed by Messik 
(1989), that has resulted In a shift in the validity literature and 
influenced the creation of new standards for student assess- 
ment (AERA, APA, and NOME , 1999). 


1 

VALIDITY AND CURRENT EVALUATIONS 
OF TEACHING 

Validity is concerned with the questions: Are we measuring what 
we think we measure? Are our inferences and actions about 
theevaluand^ supported by evidence? Because validity is linked 
to the meaning, value and the appropriateness of interpreta- 
tion, validity is the most critical consideration In evaluation. Is- 
sues of validity in the evaluation of teaching are important given 
the new developments in validity theory. 

In the late 80’s and early 90’s, a shift took place in the as- 
sessment literature that resulted In a new conceptualization of 
validity. Samuel Messick was responsible for this shift with his 
famous chapter on validity (1989), followed by Shepard (1993), 
and other authors, such as Lane, Park, and Stone, (1 998) ; Moss, 
(1992, 1998); Reckase, (1998); Yen, (1998); and Cronbach, 
(1 989) . The new framework moves away from a fragmented to a 
unified concept of validity. Under this new framework, all validity 
is about construct validity. As Messick states, the new frame- 
work “integrates considerations of content, criteria, and conse- 
quences into a construct framework for the empirical testing of 
rational hypotheses about score meaning and theoretically rel- 
evant relationships including those of an applied and a scientific 
nature” (Messick 1995, p. 751). 

In addition, validity is not a property of a test but “an over- 
all judgment of the extent of which empirical evidence and 
theory support the adequacy and appropriateness of the in- 
terpretations based on the assessment” (Messick 1995, p. 
741). Moreover, validity refers not only to meanings and inter- 
pretation of assessment scores, but also to the inferences 

' Evaluation object. 
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1 .1 . KONSTRUKTO PAGRISTUMO ASPEKTAI 
Messick (1989, 1995) isskiria sesis konstrukto pagrjs- 
tumo aspektus, kuriais reiketq vadovautis atliekant svie- 
timo vertinim^ ir siekiant nustatyti, ar tyrimai patikimi: 
turinio, esminj (substancialijji), struktOrinj, isorinj, api- 
bendrinam^i ir pasekmiq. Ory ir Ryan (2001) tyrineda- 
mi, kiek pagrjsti yra studentq pateikti destymo reitingai 
nustate, kad kai kuriuose moksliniuose darbuose ana- 
lizuojamas vienas ar kitas pagristumo aspektas, taciau 
daug svarbiq aspektq dar netyrineti. Taip pat jdomu bu- 
tq issiaiskinti, kaip jvertinimo kontekstas s^lygoja stu- 
dentq tyrimq, skirtq studentq pateikiamiems reitingams, 
pagristum^. 

1.1.1. TURINIO PAGRISTUMAS 

Vien^ svarbiausiq pagristumo aspektq sudaro jvertinimo 
galimybe atspindeti matuojamo konstrukto turinj. Svar- 
biausias su pagrjstumu susij^s klausimas: „Ar egzistuo- 
ja rysys tarp jvertinimo turinio ir matuojamo konstrukto?" 
Atliekant destymo jvertinimo aukstojoje mokykioje turi- 
nio pagrjstumas siejasi su destymo kokybes jvertinimu. 
Konstruktas gali buti neefektyvus, kai jvertinimo procese 
nepajegiama nustatyti visq gero destymo komponentq. 
Netikslumq gali atsirasti, kai vertinami kintamieji, nesusi- 
jO su destymo kokybe. 

Pagal Ory ir Ryan (2001), kadangi dauguma jvertini- 
mo formq „yra sukurtos nesiremiant teorija ar konstruk- 
tq sfera" (p. 11), kyla klausimq del jvertinimo duomenq 
interpretacijos pagrjstumo. Be to, standartizuotq proce- 
durq, skirtq kolegijq jvertinimui, taikymas gali buti pro- 
bleminis, nes, pavyzdziui, jvertinimo procese gali buti ne- 
pajegiama pilnai pateikti vertinamo konstrukto, kadangi 
neturima visq duomenq arba gali bOti analizuojami su 
konstruktu nesusijo kintamieji (Stake, Cisneros-Coher- 
nour, 2000). 

Kitos pagrjstumo problemos kyla del to, kad siuolai- 
kiniuose jvertinimuose laikomasi siaurq destymo apibre- 
zimq, nebeatitinkanciq siuolaikiniq destymo ir studijavi- 
mo teorijq reikalavimq. Nors destymo teorijos yra pazen- 
gusios nuo paprastq iki sudetingq koncepcijq, jvertini- 
me tokie pokyciai dar nejvyk^. Jeigu duomenq reiksmes 
nera aiskios, abejotina, ar jvertinimo rezultatai tinkamai 
atspindi destymo kokyb?. 

Vadinasi, nevyk^s konstrukto vertinimas ir duomenq 
nepagrjstumas atsiranda, kai vertinimas nera placiai api- 
breztas, ne vises gero destymo dimensijos ir ne visi svar- 
bus konstrukto elemental pateikti. Pagrjstumo ir jvertini- 
mo neatitikimus gali lemti skirtingos studijavimo solygos 
jvairiose auditorijose ar skirtingas kurso isdestymas. Svar- 
bu issiaiskinti, ar vertinimas nesaliskas, t.y. ar nepalaiko- 
mas vienas poziuris j destymo ir studijavimo; ar nekriti- 
kuojami alternatyvus, netradiciniai poziuriai. 

1.1.2. ESMINIS PAGRjSTUMAS 

Sis konstrukto pagrjstumo aspektas svarbus analizuojant 
respondentq atsakinejimo procesus testo metu ir pildant 
jvertinimo formas siekiant nustatyti, ar yra atitikimas tarp 
to, koatsako respondentai, irtos informacijos, kuriover- 
tinimo priemone noreta surinkti. Esminis pagrjstumas at- 
skleidziamas tuo atveju, kai matuojamas konstruktas ir 
apkiausos turinys koreliuoja tarpusavyje. Is Ory ir Ryan 
(2001, p. 14) pavyzdzio galime matyti, kad, „kai respon- 
dentas, atsakinedamas j kritinio mostymo testo klausi- 


and social consequences that result from the evaluation. 
Indeed, meaning and consequences are essential to va- 
lidity (Messick, 1989, 1995). 

1.1. ASPECTS OF CONSTRUCT VALIDITY 

Messick (1 989, 1 995) identified six important aspects of con- 
struct validity to be used for all educational assessments to 
identify sources of invalidity: construct, substantive, struc- 
tural, external, generalizability, and consequential. In their 
research on the validity of student ratings of instruction, Ory 
and Ryan (2001) found that some studies have been con- 
ducted on some aspects of these aspects of validity, but 
that other important aspects have not been addressed by 
the research. There is also a need for examining how the 
evaluation context could raise issues related to the validity 
of the student ratings research. 

1.1.1. CONTENT VALIDITY 

One of the most important aspects of validity is the capacity 
of the evaluation to reflect the content of the construct that it 
is intended to measure. This aspect of validity addresses the 
question: Is there a relationship between the content of the 
evaluation and the construct intended to be measured? In 
the evaluation of teaching in higher education, content valid- 
ity refers to the capacity of the evaluation to measure teach- 
ing quality. Consequently, there is construct under-represen- 
tation when the evaluation is not broad enough broad to meas- 
ure all the components of good teaching. There is construct 
irrelevant variance when the assessment includes variables 
other than teaching quality. 

According to Ory and Ryan (2001 ), because most evalu- 
ation forms “are developed without too much thought of 
theory or construct domains” (p. 1 1 ) , there is a raising con- 
cern about the validity of the interpretations based on evalu- 
ation scores. In addition, the use of standardized proce- 
dures for evaluating college raises issues of construct un- 
der-representation if the assessment fails in representing 
the construct been measured either because it lacks im- 
portant elements of the construct or because the meas- 
urement includes variables non relevant to this construct 
(Stake & Cisneros-Cohernour, 2000). 

Moreover, validity problems increase because current 
evaluations of teaching remain focus on narrow definitions 
of teaching that are not consistent with current theories of 
teaching and learning. While the research on teaching has 
evolved from simplistic to more complex conceptualizations, 
these changes have not taken place in the evaluation. Un- 
less there is certainty about the meaning of the scores, it 
can not be said for certain that the evaluation results are 
valid representations of instructional quality. 

Since construct under-representation could occur when 
if the assessment is not defined broadly enough to include 
critical dimensions of the construct good teaching, the scores 
can not be interpreted as good teaching if they have failed to 
include all the important elements of the construct. Having 
different conditions of learning taking place in different class- 
rooms and implementing course content in different ways 
can also result in a threat to validity if as a result of this a sub- 
group of instructors receives an unfair advantage in the evalu- 
ation. It is also important to determine if the assessment is 
encouraging a particular type of teaching and learning, and 
also if the assessment results in punishing alternative ap- 
proaches that stress non-traditional views of teaching and 
learning. 
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mus, m^sto kritiskai, testo rezultatai laikomi is esmes 
pagrjstais". 

Konstrukto esminio pagrjstumo tyrimuose siekia- 
ma atsakyti j kelet^ klausiimj; „Kas lemia rezultatg skir- 
tumus? Kas yra zinoma apie atsakinejimo procesus 
skirtingose situacijose? Jei studenti! atsakymai duo- 
toje situacijoje teigiamesni, ar jie teisingi? Ar jvertini- 
mo proceso pobudis atitinka matuojam^ konstrukt^?“ 

Svarbu suvokti ne tik tai, kad rezultatai gali vari- 
juoti skirtingose situacijose, bet ir kodel jie varijuoja. 
Taip pat reiketq issiaiskinti, kaip studentai suvokia ran- 
gavimo skal§ ir ar jg skale sutampa su testo skale; ar 
visi studentai atsako j klausimus ta pacia veikstTig se- 
ka, artam tikra grupe atsako kitaip nei likusieji; ar ver- 
tinimas objektyvus, jei studentai yra is skirtingg etni- 
nig, kulturinig grupig. 

Buvo atlikta keletas tyrimg (Marlin, 1987; Dwinell 
ir Higben, 1993; Ballantyne, 1998), kur bandyta issi- 
aiskinti studentg poziurj del jvertinimo, taciau vis dar 
mazal zinoma „kokie procesal s^lygoja studentg at- 
sakymus j rangavimo klausimus" (Ory ir Ryan, 2001, 
p. 26). Mokslininkai pazymi, kad „atlikti tyrimai tik pa- 
rode kaip kinta vertinimai skirtingose situacijose, ta- 
ciau nepaaiskino, kodel jie kinta" (2001, p. 15). Norint 
nustatyti, kaip studentai suvokia rangavimo skal^ at- 
sakinedami j klausimus, reikia naujg, tikslesnig tyri- 
mg. Ory ir Ryan (2001, p. 15) kelia klausim^, „kaip 
studentai supranta vidutinj jvertinim^, jei pasirenka- 
ma penkiabale Likerto skale? Ar pazymetas trejetas 
reiskia neigiam^, vidutinj ar atsaing jvertinim^? Kaip 
reaguojama j skal^, kurioje pazymeti tik krastutiniai 
jvertinimai? Ar vieni studentai labiau link§ rinktis kras- 
tutinius jvertinimus nei kiti? Ar kai kurie studentai ma- 
no, kad „idealus penketas" yra nejmanomas? Norint 
padaryti pagrjstas isvadas, reikia issiaiskinti, ar stu- 
dentai ir vertintojai vienodai suvokia vertinimo skal§.“ 

Kritikai iskelia dar vien^ problem^, kuri standarti- 
zuotuose destymo jvertinimuose apeinama-gauti re- 
zultatai nebutinai atspindi realius skirtumus tarp zmo- 
nig ir neeliminuoja kulturinig skirtumg. Akivaizdu, kad 
butina istirti destymo aukstojoje mokykioje jvertinimo 
esminj pagrjstum^. Rezultatg jvertinimas irjg interpre- 
tavimas gali buti patikimas tik tuo atveju, kai studentg 
atsakymg sistema ir atsakymg skirtumg priezastys yra 
issiaiskinamos. EL-Hassan (1995) teigia, kad taip pat 
reiketg issamiau paanalizuoti, kaip studentg atsaky- 
mus veikia jvairus veiksniai: destytojo uzimamos pa- 
reigos, lytis, destomas kursas (privalomas ar pasiren- 
kamas, disciplinos pobOdis, grupes dydis, uzduocig 
sunkumas), studento motyvacija, gaunami pazymiai. 

1 .1 .3. STRUKTURINIS PAGRjSTUMO ASPEKTAS 
Sis konstrukto pagrjstumo aspektas reikalauja, kad 
„pasirinkto konstrukto teorija sqlygotg ne tik atitinka- 
mg vertinimo uzduocig pasirinkim^ ar konstravim^, bet 
ir racionaig konstruktu pagrjsto vertinimo kriterijg irjg 
dalig sukurim^" (Messick, 1994, p. 15). Strukturinis 
pagrjstumo aspektas atsako j klausimq: „Koks rysys 
tarp skirtingg jvertinimo proceduros komponentg ir 
jvertinamo konstrukto?" jvertinimas turi pagrjsti rysj 
tarp atskirg jvertinimo mechanizmo komponentg bei 
konstrukto strukturos. Taip pat svarbu zinoti, ar rezul- 
tatg sumavimo budas atitinka konstrukto ribas. 


1 .1 .2. SUBSTANTITIVE VALIDITY 

For this aspect of construct validity it is important to analyze 
response processes of those taking the test and completing 
the evaluation forms in order to see if there is a fit between the 
process used to answer and the process for which the as- 
sessment was developed. Evidence of substantial validity can 
be found when there is a fit between what is been tested and 
the construct measured. As Ory and Ryan (2001) illustrate, 
“When an examinee uses critical thinking to answer items on 
a test of critical thinking there is evidence for the substantial 
validity of the test scores.” (p. 14). 

Studies on the substantive validity aspects of construct 
validity focus on questions such as: What accounts for score 
differences? What do we know about the response proc- 
esses in different situations? If students respond more posi- 
tively in a given situation, are they responding more or less 
truthfully? Does the nature of the evaluation process match 
the construct being measured? 

It is not enough to know that the scores change in differ- 
ent situations, it is necessary to know why the change takes 
place. We also need to understand how students use the rat- 
ing scales to respond, and if there is a fit between the in- 
tended meaning of the scale and the meaning of the scale for 
students. It Is important to determine if all students follow simi- 
lar processes when responding to the tests. Do some sub- 
groups of students respond differently than others? Is the 
assessment appropriate for different groups of students of 
diverse ethnic and culturai backgrounds? 

Several studies (Marlin, 1987; Dwinell and Higben, 1993; 
Ballantyne, 1 998) have been conducted about student attitudes 
about the evaluation, specifically towards the student ratings. 
But, little is still known about “the actual process followed by 
students to respond to rating forms.” (Ory and Ryan, p. 26) 
According to these authors “past research efforts have Indi- 
cated how ratings change In different situations but they do 
little to help us understand why the change occurs” (p. 15). 
More research Is also needed to understand how students use 
the rating scales to respond. As Ory and Ryan (2001) state, “If 
Items are presented with a five point Likert scale, how do stu- 
dents Interpret and use the middle category? Do students mark 
a “3” to indicate an inability to respond, a middle response, or 
a lack of interest? If only the endpoints are labeled, how do 
students interpret and use the other scale points? Are some 
students more reluctant than others to use the extreme ends of 
the scale? Do some students believe that a “perfect five” is 
unobtainable? To make valid Inferences from student ratings 
we need to determine if there is proper fit between what the 
meaning of the scale was for students, and the intended mean- 
ing of the scale.” (p. 15) 

There is also an important problem not addressed with stand- 
ardized evaluations of teaching on campus, as identified by criti- 
cal scholars, is that the scores do not necessarily reflect real differ- 
ences among people, and they often do not adequately eliminate 
underlying biased cultural assumptions built Into the test as a whole. 
Consequently, there Is a need for conducting research on the sub- 
stantive validity of the evaluation of teaching in higher education. 
The interpretation and use of evaluation results can be improved 
if the student response pattern and the differences in response 
patterns among different students are understood. As El-Hassan 
(1 995) states, there is also a need for examining more deeply how 
variables related to the instructor (faculty rank and gender) or the 
course (required versus elective, academic discipline, class size, 
workload-difficulty), or the student (motivation towards the course, 
expected grades) could influence student response patterns. 
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Daugeliu tyrimij bandyta nustatyti gero destymo 
charakteristikas ir procesus remiantis studentg, desty- 
tojij ar administracijos pateiktais duomenimis. Didzioji 
dalis ivertinimo priemoniq yra sukurtos butent tokig ty- 
rirriLj pagrindu. Mokslininkai nustate koreliacijas tarp 
ivairiq savybig, charakteristikq ir destymo reitingavimo. 
Pavyzdziui, Centra (1993) ir Feidman (1976) isanaiiza- 
yq kelet^ jvertinimo formg, rado tarn tikrg bendrumLj. 
Kita vertus, reikia atkreipti demesj j tai, kad studentai 
panasiai atsako j tarn tikrus klausimus ne todei, kad jie 
prikiauso zinomai sriciai. Tarsi butg analizuojami stu- 
dentg atsakymai j simtus matematinig klausimg, su- 
grupuojant juos j atsakymais pagrjstus klasterius, iden- 
tifikuojant kaip esminius gebejimus, kurie padeda is- 
spr^sti matematinius klausimus (Ory ir Ryan 2001 , p. 
18). Galima pamineti Kulik ir McKeachei (1975) bei Feld- 
man (1987) metaanalizes studijas, t.y. nors mokslinin- 
kai kruopsciai isanalizavo daugyb^ skirtingg duome- 
ng, taciau nesugebejo nustatyti destymo kokyb^ api- 
bOdinancig esminig charakteristikg ir eigsenos. Ory ir 
Ryan (2001 , p. 1 1 ) teigia, kad „butina apibrezti tikslias 
efektyvaus destymo charakteristikos ribas. Tg ribg ne- 
apibrezus institucijos kelia klausimg, kokiu pagrindu 
buvo sudarytos jvertinimo formos ir, kas dar svarbiau, 
kaip analizuojami ir interpretuojami gauti duomenys." 
Kadangi neturima empirinig duomeng, kad pasirinkti 
elemental is tiesg atspindi ger^ destym^, butini tolesni 
strukturinio pagrjstumo aspekto tyrimai. Be to, svarbu 
issiaiskinti jvertinimo formg konstrukto ribas: kiek jo 
ribos lemia destymo kokyb? ir kaip skirting! asmenys 
suvokia destymo reitingavimo? 

1.1.4. ISORINIO PAGRjSTUMO ASPEKTAS 
Sis aspektas nurodo jvertinimo rysj su kitais kintamai- 
siais, kurie yra isoriniai atliekamam vertinimui, siekiant 
nustatyti informacijos pagrjstumo akivaizdumo. Taigi 
„taskg reiksme patvirtinama isoriskai jvertinus empirinj 
duomeng atitikimo laipsnj kitg matavimg duomenims, o 
esant skirtumams, ar tie jverciai atspindi jg tikrojo reiks- 
mo“ (Messick 1994, p. 16). Be to „svarbus isorinig san- 
tykig rysiai susiformuoja tarp vertinimo taskg ir kriterijg, 
susijusig su atranka, jdarbinimu, licencijavimu, progra- 
mos jvertinimu bei kitg su atsiskaitomybes procesu sie- 
jamg analizes kriterijg" (Messick 1994, p. 17). 

Ankstesniuose studentg pateikto destymo reitinga- 
vimo pagrjstumo tyrimuose buvo analizuojami svarbus 
pagrjstumo aspektai, bandyta nustatyti ar egzistuoja 
rysys tarp studentg vertinimo ir jg pasiekimg^ . Dazniau- 
siai per tokius tyrimus nustatomas jvertinimo pagrjstu- 
mas analizuojant vieno kurso, kurj desto skirtingi des- 
tytojai, jvertinimo rezultatus ir jg santykj su studentg 
pasiekimais (Ory ir Ryan, 2001). Pavyzdziui, Cohen 
(1981) tyrinejo koreliacijotarp studentg pateikto reitin- 
gavimo ir jg pasiekimg; Murray (1983) jtrauke j tyrimus 
parengtus stebetojus, kurie turejo nustatyti skirtumus 
tarp geriausiai ir prasciausiai jvertintg destytojg. 

Perziurejus didelj kiekj daugiaplanig tyrimg, atliktg 
Abrami, D'Apolloniair Cohen (1990), paaiskejo, kad bu- 
tina tikslesne analize, jei siekiama suprasti, kokios yra 
apibendrinamumg ribos, kriterijg efektyvumas, reitinga- 
vimo dimensijos ir destymo s^lygos. Kadangi grupig ho- 

^ Parodomg pazymiais 


1 .1 .3. STRUCTURAL ASPECT OF VALIDITY 

This aspect of construct validity stresses that “the theory of the 
construct domain should guide not only the selection or con- 
struction of relevant assessment tasks, but also the rational de- 
velopment of construct-based scoring criteria and rubrics.” 
(Messick 1 994, p. 1 5). This aspect of validity addresses the ques- 
tion: To what extent does the relationship among different com- 
ponents of the evaluation procedures correspond with the con- 
struct being evaluated? In this way, the evaluation needs to pro- 
vide evidence that the relationship among the different compo- 
nents of the assessment instrument correspond with the struc- 
ture of the construct domain. It is also important to know how well 
the scoring structure is consistent with the construct domain. 

A large number of studies have been conducted in order to 
determine the characteristics or behaviors that constitute good 
teaching, based mostly on the perceptions of students, teachers 
or administrators. Most evaluation instruments are based on those 
characteristics. Some researchers have also found correlations 
between these and other sets of characteristics and behaviors 
and the ratings of instruction. For example. Centra (1993) and 
Feldman (1 976) found common dimensions after analyzing sev- 
eral evaluation forms. However, it is important to consider that 
items are included on many forms because students appear to 
respond similarly to particular ones not because they come from 
a known domain of targeted characteristics. It is somewhat like 
analyzing student responses to hundreds of math items, group- 
ing the items into response-based clusters, and then identifying 
the clusters as essential skills necessary to solve math problems. 
(Ory & Ryan 2001 , p. 18). In addition, some studies such as the 
work of Kulik and McKeachie (1975), and Feldman (1987) were 
meta-analyses. Although there is consistency across the differ- 
ent data analyzed, the researchers have not been able to identify 
a single set of characteristics and behaviors, as those essential 
for defining the construct teaching quality. As Ory and Ryan (2001 ) 
state, “Without a clearly defined target domain of effective instruc- 
tional characteristics, it is unclear how institutions select the con- 
tent of their evaluation forms, and more importantly, what do these 
institutions infer as the meaning of their ratings” (p. 11). Since 
there is no empirical evidence that the items selected are indeed 
elements of good teaching, there is a need of more research on 
this aspect of construct validity. Consequently, an important area 
of research needs to focus on what is the construct domain in the 
evaluation forms? How this domain relates to instructional quali- 
ty? What is the meaning of the ratings for different stakeholders? 

1 .1 .4. EXTERNAL ASPECT OF VALIDITY 

It refers to the relationship of the evaluation to other variables, 
external to the assessment in order to provide source of validity 
evidence. In this way, “the meaning of the scores is substantiated 
externally by appraising the degree to which empirical relation- 
ships with other measures, or the lack thereof, is consistent with 
that meaning” (Messick, 1994, p. 16). In addition, “special impor- 
tance among external relationships are those between the as- 
sessment scores and criterion measures pertinent to selection, 
placement, licensure, program evaluation, or other accountabil- 
ity purposes in applied settings” (Messick, 1994, p. 17). 

Prior research on the validity of student ratings has been con- 
ducted to address this important aspect of validity. Some of these 
studies have been conducted to determine if there is a relation- 
ship between student ratings and student achievement.^ The 
multisection studies are an example of this kind of research that 
determines the validity of the evaluation by analyzing the correla- 

^ Defined as grades. 
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mogeniskumas ir kiti veiksniai kinta, koreliaciniq tyri- 
mq rezultatai turetq buti traktuojami labai atsargiai. Be 
to, mineti tyrimai dazniausiai buvo atlikti pirmakursiams 
ir antrakursiams skirtuose ivadiniuose kursuose. 

Daugiaplaniuose tyrimuose studentq pateikti rei- 
tingavimai lyginti su kitokiais duomeni! saltiniais, pa- 
vyzdziui, bendraamziq ir absoiventq pateiktais reitin- 
gavimais, savianalize ir t.t. Taip buvo siekiama patik- 
rinti, ar rezuitatai, gauti is skirtingq saitiniq, nepriesta- 
rauja vieni kitiems. Paaiskejo, kad studentq ir absoi- 
ventq atsakymai akivaizdziai sutampa, buvo nustaty- 
tos stiprios teigiamos koreiiacijos tarp siq kintamqjq. 
Kituose tyrimuose buvo lygintos skirtingos duomenq 
rinkimo formos, pavyzdziui, ar skiriasi studentq atsa- 
kymai i uzdarus ar atvirus kiausimus, grupes interviu 
irt.t. (Ory, Braskamp ir Pieper 1980; Ory ir Ryan 2001). 

1 .1 .5. PAGRjSTUMO APIBENDRINAMUMO ASPEKTAS 
Vertinimo pagrjstumo apibendrinamumo aspektu sie- 
kiama nustatyti, ar egzistuoja rysys tarp „jau jvertintq 
uzduociq ir kitq uzduociq, kurios reprezentuoja kons- 
trukt^ ar atskirus jo aspektus" (Messick 1994, p. 15). 
Apibendrinamumo aspektas rodo „jverciq reiksmiq ri- 
bas" (Messick 1994, p. 15). 

Pagrjstumo apibendrinamumo aspektas kelia to- 
kius kiausimus: „Ar gaiima iyginti skirtingq daiykq ap- 
iinkos ir skirtingq iaikotarpiq jvertinimq reiksmes? Ar 
gaiima daryti tas pacias isvadas apie jvertinimus, at- 
iiktus skirtingoje aplinkoje? Ar pagrjsta lyginti jvercius, 
naudotus skirtingiems tiksiams? Ar jverciai, gauti skir- 
tingoje apiinkoje gaii buti iyginami?" Apibendrinamu- 
mo tyrimais bandoma nustatyti ir atskieisti skirtumus, 
suprasti, kodel jie atsiranda, ismokti, kaip juos api- 
bendrinti pristatant vertinimo rezuitatus ir siekiant su- 
stiprinti vertinamo proceso pagrjstum^. 

Nors kai kurie moksiininkai teigia, kad skirtingq 
grupiq studentq vertinimus gaiima apibendrinti, Abra- 
mi, d'Apoiionia ir Cohen (1990) tuo abejoja, sakyda- 
mi jog „didzioji daiis tyrimq buvo atiikti jvadiniuose pir- 
makursiams ir antrakursiams skirtuose kursuose" (Ory 
ir Ryan 2001 , p. 20). Taigi reikalingi detaiesni sio kon- 
strukto pagrjstumo tyrimai. 

1.1.6. PASEKMES PAGRjSTUMAS 

Vertinimo pasekmiq pagrjstumo aspektas rodo trum- 
paiaikes ar iigalaikes vertinimo ir jo rezuitatq interpre- 
tavimo pasekmes (Wilson, 1999). Svarbios tiek numa- 
tomos, tiek nenumatomos pasekmes. Pasekmiq pa- 
grjstumas skatina tyrineti jvertinimo teorijq vertybes, 
isvadas ir principus, taip pat ideology^, kurioje teorija 
yrataikoma. Dideiis demesys skiriamas pasekmems, 
„sietinoms su vertinimo saiiskumu, neteisinga inter- 
pretacija ar nes^ziningu testo taikymu" (Messick, 
1994, p. 17). 

jvertinant destym^ aukstojoje mokykioje, pasekmiq 
pagrjstumas kol kas nesulauke pakankamo tyrinetojq 
demesio; dei to reikalingi tiksiesni jvertinimo saiisku- 
mo, numatomq ir nenumatomq pasekmiq tyrimai. Svar- 
bu jvertinti „teorijos principus ir potenciaiias ar esamas 
probiemas, su kuriomis gaii susidurti institucija" (Ory ir 
Ryan 2001 , p. 26); nustatyti potenciaiias neigiamas jver- 
tinimo pasekmes, gaiincias kiiti dei nepakankamo rep- 
rezentatyvumo, saiiskumo ar nes^ziningumo. 


tion of evaluation results of a single course that is taught by differ- 
ent instructors with the section mean of student achievement (Ory 
and Ryan, 2001). In addition, Cohen conducted correlation stud- 
ies (1981) on the relationship between student ratings and stu- 
dent achievement, and other researchers such as Murray (1983) 
used trained observers to determine teaching differences among 
instructors who obtained high and low ratings. 

A review of several dozen multi-section studies conducted 
by Abrami, D’ Apollonia, and Cohen (1990) showed that al- 
though consistent, more research needs to be conducted to 
understand the limits on generalizability of rating validity across 
rating dimensions, effectiveness criteria, and conditions of in- 
struction. As is well known, correlation research findings need 
to be taken carefully because group homogeneity and other 
factors vary so. In addition, many of these studies were con- 
ducted in low learning, introductory courses taught primarily to 
freshmen and sophomores. 

Multi-trait studies have also compared the results obtained 
from student ratings with other data sources, such as peers, alumni, 
self-ratings, etc. Researchers have studied different evaluation 
sources to determine the consistency among different data sources 
in evaluating teaching. Researchers have found high positive cor- 
relations between student ratings and alumni ratings. In addition, 
another group of studies have studied the correlation between 
different forms of data collection, such as "student overall ratings 
of instructor competence as measured rating items, written com- 
ments to open-ended items, and group interviews (Ory, Braskamp 
and Pieper, 1980; Ory and Ryan, 2001). 

1 .1 .5. GENERALIZABILITY ASPECT OF VALIDITY 

It examines if there is correlation "of the assessed tasks with other 
tasks representing the construct or aspects of the construct” 
(Messick 1 994, p. 15). Generalizability refers to the "boundaries of 
score meaning” (Messick 194, 15). 

Generalizability as an aspect of construct validity addresses 
questions such as: Can we make comparable inferences about 
the meaning of the scores across subjects, settings, and time? 
Can we make the same inference about ratings collected in differ- 
ent settings? Can we make valid comparisons between scores 
used for one purpose versus another purpose? Are assessment 
scores collected in different settings comparable? Generalizability 
studies focus on determining differences, understanding why they 
occur, and learning how to account for them in reporting assess- 
ment results to enhance the validity of the assessment process. 

Although some researchers support the generalizability of stu- 
dent ratings across different sections, other researchers such as 
Abrami, d’Apoiionia, and Cohen (1990) have questioned the 
generalizability of the evaluations of teaching using student rat- 
ings "because so many of the studies were conducted in lower- 
learning, introductory courses taught primarily to freshmen and 
sophomores” (Cry and Ryan, 2001, p. 20). More research is 
needed on this important aspect of construct validity. 

1 .1 .6. CONSEQUENTIAL VALIDITY 

This refers to the short and long-term consequences of evaluation 
use and the consequences associated with interpretations of evalu- 
ation scores (Wilson, 1999). Both intended and unintended con- 
sequences are important. Consequential validity also implies the 
need for appraising the value implications of the theory underly- 
ing evaluation scores, as well as the ideology in which the theory 
is embedded. When collecting evidence about the consequential 
aspect of validity, especial emphasis is given to consequences 
"associated with bias in scoring and interpretation or with unfair- 
ness in test use” (Messick, 1994, 17). 
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POZITYVISTINES PARADIGMOS 
RIBOTUMAI 

Pozityvistinio destymo ivertinimo privalumai yra orga- 
nizuotumas ir paprastumas, taciau, daugumos auto- 
rig nuomone, pastebeti ribotumai sumenkinasiuos pri- 
valumus. Standartizuotas studentg vertinimas gali tu- 
reti jvairig neigiamg pasekmiq. 

Pirma, isskirtinis demesys charakteristikoms ar elg- 
senos savybems „riboja zinias apie destym^ ir studija- 
vim^“ (Dunkin ir Barnes, 1986, p. 774). Toks jvertini- 
mas dazniausiai yra orientuotas j destytoj^, t.y. desty- 
tojas turi „nuosekiiai ir aiskiai perteikti tarn tikr^ kurso 
medziag^, o studentas jsisavinti kurs^ atlikdamas tra- 
dicines uzduotis tradiciniais studijavimo metodais" 
(Centra ir Bonesheli, 1990). is to ispiaukia, kad alter- 
natyvius destymo budus taikantys destytojai yra kriti- 
kuojami. Gaiimas neatitikimas tarp jvertinimo ir bet ko- 
kio destymo, besiremiancio konstruktyvistine mokymo- 
si teorija ar asmenybes bei kognityvinio vystymosi te- 
orijomis (Mabry, 1999). 

Nors charakteristikg ir eigsenos apibendrinimai 
gauti proceso ir rezuitato tyrimq metu koreiiuoja su stu- 
dentg eigesiu per egzaminus ir testus, kyla abejoniq 
dei pagrjstumo. Pagrjstumui jrodyti reikaiingi tiksiesni 
empiriniai tyrimai. Nustatyti elgesio ypatumai ir cha- 
rakteristikos remiasi ivairiq duomeng anaiize, taciau „tu- 
rima mazai jrodymg, kad destytojg elgesys auditorijo- 
je visiskai atitinka apibrezt^ elgesio standartq" (Shul- 
man 1986, p. 12). Probiemiskas ir apkiausg metu nu- 
statytg destytojg ir studentg eigesio charakteristikg, bu- 
dg, bruozg ir eigesio ypatumg panaudojimas tyrimui. 
Empirinig duomeng nepakanka norint tvirtinti, kad sie 
bruozai atspindi ger^ destym^ ar kad jie s^lygoja stu- 
dijavim^ (Miller, 1974; Genova, 1986). 

Turinio pagrjstumo problemg atsirandatuomet, kai 
vertinimo kriterijus sudaro ne visos budingiausios cha- 
rakteristikos ir eigesio ypatumai. Kaip jau mineta, per 
piatus ar per siauras destymo apibrezimas gali neat- 
spindeti visg gaiimg destymo situacijg (Stake ir Cisne- 
ros-Cohernour, 2000). Remiantis Doyie (1982, p. 27) 
nejmanoma, kad tas pats charakteristikg rinkinys ga- 
li vienodai tikti destant skirtingus daiykus skirtingiems 
studentams, skirtingomis aplinkybemis [...]. Tokio s^- 
raso sudarymas iabai rizikingas". Kai destymo koky- 
bei nustatyti taikomos jvairios charakteristikos, jg presk- 
riptyvinis panaudojimas gaii riboti destymo kurybingu- 
m^ ir stabdyti profesinj tobuiejim^. Destymo budg ir/ 
ar bruozg kaip kriterijg taikymas destymo kokybes jver- 
tinime, varzo destytojg darbo jvairov^, gali nukenteti 
destytojai „nepatenkantys j nustatytus rernus" (Stake 
ir Cisneros-Cohenour, 2000). 

Tendencija destymo kokyb? apibendrinti skaitme- 
niniais rodikiiais gali isprovokuoti pastangas gerinti 
tik rezuitatus, o ne darbo kokyb^ (Cisneros-Coher- 
nour, 1997). Lyginimo rezuitatai netaikant grieztos 
destymo ir studijavimo process iemiancig kintamgjg 
kontroies gaii buti neteisingi (Stake ir Cisneros-Co- 
henour, 2000). 

Taip pat svarbu atsizveigti j ekspertg, jvertinancig 
destymo kokyb§, objektyvum^. JAV daugelis eksper- 
tg, tikrinancig destymo jvertinimo pagrjstumo auksto- 


The consequentiai validity of the evaiuation of teaching in 
higher education is an area that has received littie attention by the 
researchers. More research is needed on the value implications 
of the evaluation results, the intended and unintended conse- 
quences of using certain criteria for defining and assessing good 
teaching, “the ideoiogy within which the theory is imbedded, (and) 
the potentiai or actuai probiems that couid result for the institu- 
tion as a result of the consequences" (Cry and Ryan 2001 , p. 26). 
More studies are needed to determine the potentiai negative con- 
sequences of the evaiuation, especiaily in regard to issues of 
bias, fairness, primariiy in reiation to minority and other underre- 
presented groups among the faculty. 


2 

LIMITATIONS OF THE POSITIVISTIC 
PARADIGM 

A positivistic orientation to the evaiuation of teaching has the 
benefit of organization and simplicity. But many see the limi- 
tations and probiems surpassing its benefits. As with stand- 
ardized student assessment, this orientation couid result in 
some serious negative consequences. 

First, exciusive use of characteristics or behavioral attributes 
is “limiting to a certain kind of knowiedge about teaching and 
learning.” (Dunkin and Barnes, 1986, p. 774). The evaluation 
usualiy centers on a kind of teaching that is teacher-centered. 
In other words, a kind of teaching in which the instructor’s task 
“is to cover a weii defined set of topics for a course systemati- 
cally and precisely, while the student’s task is to master the 
course content through traditional assignments and study meth- 
ods,” (Centra & Bonesheli, 1990). Instructors using a different 
teaching approach or style may be at a disadvantage. There 
can be a mismatch between the evaluation and any teaching 
consistent with constructivist learning theory, as well as with theo- 
ries of human and cognitive development (Mabry, 1999). 

The use of a list of characteristics and behaviors from the 
process-product research, although correlated with student per- 
formance in exams or tests, raises validity questions. Their valid- 
ity has not been determined empirically. These behaviors and 
characteristics are the outcome of synthesis from aggregate data, 
but there is “little evidence that any observed teacher had ever 
performed in the classroom congruent with the collective pattern 
of the composite” (Shulman, p. 12). The use of characteristics, 
styles, traits and behaviors identified from surveys to faculty and 
students also presents a problem. There is little empirical evi- 
dence that any of them constitute good teaching, or that they are 
related to student learning (Miller, 1974; Genova et al, 1986). 

There are problems when not all relevant characteristics and 
behaviors used as criteria are included in the assessment, prob- 
lems of content validity. As said earlier, using a general and nar- 
row definition of teaching is problematic because it may not be 
appropriate for all teaching situations (Stake and Cisneros- 
Cohernour, 2000). Doyle (1982) says, ”... it seems most unlikely 
that any one set of characteristics will apply with equal force to 
teaching all kinds of materials to all kinds of students under all 
kinds of circumstances... To prepare such a list entails a sub- 
stantial risk” (p. 27). When a number of characteristics are adopted 
as indicators of teaching quality, their prescribed use can result in 
limiting instructional creativity, and can become a barrier for pro- 
fessional development. The use of traits and/or teaching styles 
as criteria for evaluating teaching constrain diversity in instruc- 
tors, penalizing those who do not “fall within the norm” (Stake 
and Cisneros-Cohernour, 2000). 
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siose mokyklose atsiduria dviguboje padetyje kaip 
mokslininkai ir tie, kurie organizuoja ir atlieka jvertini- 
m^. Jq moksline veikla tiesiogiai arba netiesiogiai daz- 
niau mazina jq kaip institucijos administratoriq veik- 
los pagrjstum^. 

JAV paskelbtose publikacijose apie destymo ty- 
rimus laikomasi gana vienodos nuomones, tik kai 
kuriuose darbuose priestaraujama sios tyrejq gru- 
pes gautiems tyrimq rezultatams. Isimtis - Kana- 
dos mokslininko Brodie darbas. 1998 metais Bro- 
die atliktq tyrimq rezultatai parode, kad ankstes- 
niuose darbuose, kuriuose buvo nustatytas rysys 
tarp aukstesnio pazymio ir studentq pateiktq rei- 
tingavimq, „nepakankamai jvertinta tendencija ra- 
syti aukstesnius jvertinimus del grieztumo stokos" 
(1998, p. 17). Brodie issiaiskino, kad „kai to paties 
dalyko jvertinimai atskirose grupese smarkiai sky- 
resi, destytojai, rasantys geresnius pazymius ir uz- 
duodantys maziau darbq, gaudavo aukstesnius 
jvertinimus" (1998, p. 17). Be to, Brodie (1999), ana- 
lizuodamas santykj tarp studentq pateiktq reitinga- 
vimq ir jq pasiekimq, rado naujq nesutapimq. Ne- 
sutapimq pasitaike ataskaitose apie tyrimq rezul- 
tatus ir ta pacia tema skelbtq straipsniq. Mokslinin- 
kas teigia, kad nesutapimai tarp destymo charak- 
teristikos ir reitingavimq, skelbtq keliuose leidinluo- 
se, atsirado del to, kad „kal kurie mokslininkai tle- 
siog Istryne nelgiamas ar zemas korellacijas Ir su- 
falsifikavo vertinimo skal? paversdami telgiama" 
(1999, p. 1). Nors niekas nesidomejo, klek Brodie 
telginial telsingi, vis delto svarbu atsizveigti j dve- 
jop^ tyrinetojo - mokslininko ir jvertinimo sistemos 
administratoriaus - vaidmenj. 

Kiti pozityvistines paradigmos kritikai teigia, kad 
mokslininkai per daug demesio skiria skaiciavimams 
ir pavercia moksl^ technika. Horkheimer ir Adorno 
(1948, p. 11) nuomone, „jie pakeite koncepcij^ for- 
mulemis, o priezastingum^ taisykiemis ir tikimybe- 
mis.“ Magunsson (2000), diskutuodama del sklrtln- 
gos kulturines ir etnines kilmes akademinio persona- 
lo destymo kokybes nustatymo pagrjstumo ir tinka- 
mumo, yra pasakiusi, kad „tyrimai, kuriuose „mazu- 
mos“ priskiriamos nezymiems sistemos nukrypimams 
ar panasioms su skaiciavimq sistema susijusioms 
sampratoms, is esmes neatsisako techninio psicho- 
metrijos diskurso problemq tyrimo. Kitaip tariant, jei- 
gu rasizmas egzistuoja, jis budingas visai organiza- 
cijai, negali buti tik nezymus skaiciavimo sistemos nu- 
krypimas" (Magunsson 2000, p. 45). 

Menges (1998) teigia, kad reikia tirti, kaip inter- 
pretuojama ir naudojama informacija apie jvertinimus, 
kaip mokytojai „pritaiko jvertinimo rezultatus planuo- 
dami, dirbdami auditorinj darb^ ir vertindami savo 
destymo lygj" (p. 3). Po to Menges priduria, kad pa- 
grindinis tyrimq trukumas yra „destymo konteksto ne- 
paisymas [...], skirtingo dalyviq poziurio ir jq asmeni- 
niq, organizaciniq ir politiniq ypatumq ignoravimas" 
(p. 4). 

Teigiamo pozityvistinio modelio aspektu laikytina 
tai, kad vis labiau domimasi tyrimq prielaidq tikrini- 
mu ir jvertinamo konstrukto pagrjstumu (Menges, 
1998; Theall ir Franklin 1999, 2000; Ryan ir Johnson 
1998; Ory ir Ryan 2001). 


The tendency to summarize teaching quality in a numerical 
Index can lead to the unintended consequence of people focusing 
more on Improving the scores than on Improving their teaching 
(Cisneros-Cohernour, 1997). Comparisons made without a rigor- 
ous control of variables Influencing the teaching and learning proc- 
ess, can lead to unfairness (Stake and Cisneros-Cohernour, 2000). 

It Is also Important to review the claims of objectivity made by 
those conducting research on the evaluation of teaching. In the 
U.S., most of those conducting research on the validity of the evalu- 
ations of teaching in higher education have a dual role, as scholars 
and as those who develop and Implement the evaluation. Their 
scholarly work directly or Indirectly more often than not supports 
the validity of their work as administrators In the Institution. 

The publications of teaching research in the US contain few 
studies that contradict the main findings of this research commu- 
nity. An exception is the work of Brodie, a Canadian researcher, 
who in 1988 found that prior studies on the relationship between 
grade inflation and student ratings “have underestimated the bias- 
ing effect of grading leniency” (p. 17). In that study, Brodie found 
evidence that “when grades varied markedly across sections of 
the same course, the professors assigning highest grades with 
less studying received highest evaluations” (p. 17). In addition, 
Brodie’s (1 999) review of the research on the correlation between 
student evaluations of teaching and student learning raised new 
Issues. He found discrepancy between the results In research re- 
ports and the published articles of the same study. He encoun- 
tered evidence that correlations between certain teaching charac- 
teristics and the ratings as reported in several journals have been 
Inflated and that “some researchers have deleted low and/or nega- 
tive correlations, but also created positive correlations by revers- 
ing the rating scale” (p. 1). Although no research has been con- 
ducted to confirm the findings of Brodie’s research, or about the 
influence of the dual role of the researcher as scholar and as ad- 
ministrator of the evaluation system, these important questions de- 
serve more attention. 

Other critics of the positivistic paradigm perceive that the 
emphasis put on measurement by the researchers has been so 
strong that has replaced science with technique. As Horkheimer 
and Adorno, state, “they have replaced the concept with the for- 
mula, and causation with rule and probability” (1948, p.11). 
Magunsson (2000), in her discussion of the appropriateness and 
validity of the evaluation for assessing the quality of teaching of 
instructors of diverse cultural or ethnical background, adds: 

“The problem with an analysis that equates ‘minority’ with 
small systemic variance, or other such measurement concepts, 
is that it constructs the issue once again within the technical dis- 
course of psychometrics. The problem Is that If there Is racism, 
this Is systemic to the entire organization and can’t be reflected 
merely as systemic variance related to measurement” 
(Magunsson 2000, p. 89) 

Menges (1998), also claims that more research Is needed 
with still little known about how the Information from the evalua- 
tion is interpreted and used, and about how teachers “use the 
evaluation in planning, implementing, and appraising their own 
teaching.” (p. 3). He adds that the main shortcoming of the re- 
search is “its lack of recognition of the context of teaching ... (Ig- 
noring) the perspectives of different participants, and their per- 
sonal, organizational, and political contexts” (p. 4). 

What Is promising is that among the researchers supporting 
the positivistic paradigm is a growing interest for testing the as- 
sumptions held by the research, and for examining the validity of 
the construct being evaluated (Menges (1998); Theall & Franklin, 
(1990, 2000); Ryan & Johnson (1998); and Ory and Ryan (2001). 


EDITH J. CISNEROS-COHERNOUR 
DESTYMO AUKSTOJOJE MOKYKLOJE PAGR|STUMAS IR IVERTINIMAI PAGAL POZITYVISTINE PARADIGMA 
VALIDITY AND EVALUATIONS OF TEACHING IN HIGHER EDUCATION INSTITUTIONS UNDER POSITIVISTIC PARADIGM 


ISVADOS 

Isaugus atsiskaitomybes poreikiui jvertinti destym^ 
aukstojoje mokykioje, neformalus poziuriai tape sis- 
teminiais. Be to, administraeijai emus dometis is- 
matuojamais rezultatais, destytojai erne rupintis jver- 
tinimo objektyvumu ir tuo, kaip jis gali s^lygoti jq 
darbo sutartis, paaukstinimo ar algos pakelimo ga- 
limybes. 

Kadangi tyrimai apie destym^ ir studijavim^ tapo 
kompleksiskesni ir issamesni, kilo daug naujq klau- 
simq del tradicinio pozityvistinio poziGrio pagrjstu- 
mo. Pagal sj poziurj destymas yra susij^s su efekty- 
vumo ivertinimu. Geras destymas apibreziamas kaip 
idealiq charakteristikq ar eigsenos normq rinkinys, 
kurio destytojas turi iaikytis. Taciau, kai kurie tyrine- 
tojai ger^ destym^ laiko globaiiu konstruktu, priesin- 
gu savybiq, eigsenos ar dimensijq rinkiniui. Pozity- 
vistinio poziurio tyrimuose pabreziama skaiciavimq 
probiema; didelis demesys skiriamas studentq pa- 
teiktq reitingavimq patikimumui ir stabiiumui, veiks- 
niq, galinciq padaryti neigiam^ jtakq jvertinimui, nu- 
statymui. Atiikti tyrimai apie is skirtingq saltiniq gau- 
tus jvertinimo privaiumus ir trukumus (studentq, ben- 
draamziq, isoriniq stebetojq, administratoriq vertini- 
mus, savianaiiz^, irt.t.), studentq pateiktq reitingavi- 
mq ir jvairiq kintamqjq, pavyzdziui, jq pasiekimq, san- 
tykj. Taip pat analizuotas jvertinimo formq vidinis nuo- 
seklumas (punktq anaiize) ir reitingavimq stabilumas 
laiko skaleje. 

Pozityvistinio poziurio saiininkai jsitikin^, kad stu- 
dentq pateikti destymo kokybes reitingai yra patiki- 
mas saitinis. Moksiininkai nustate koreliacijas tarp stu- 
dentq pateiktq reitingq ir jq pasiekimq^ bei to paties 
destytojo reitingq pastovumo skirtingose grupese. 
Daugiapianiuose tyrimuose buvo jrodymq apie diskri- 
minantinj ir konvergentinj reitingq pagrjstum^. Be to, 
nagrinejant gaiimus kintamuosius, kurie neigiamai jta- 
koja reitingavimq paaiskejo, kad neigiamos jtakos pa- 
grjstumui gali tureti kurso pobudis (pasirenkamas ar 
privalomas) ir destomas dalykas. 

Pozityvistinio poziurio kritikai teigia, kad perverti- 
nama apibendrinimo ir priezasties bei pasekmes nu- 
statymo svarba. Studentq pateiktq reitingq tyrejai ne- 
pajege nustatyti pagrindiniq gero destymo konstruk- 
to elementq. Taip pat kritikuojamas per didelis deme- 
sys studentq reitingq metaanaiizei, ypac kai ji sqlygo- 
ja administracinius sprendimus, kurie gali paveikti dar- 
buotojq karjerq. Be to, ankstesni tyrimai apie desty- 
mo aukstosiose mokykiose jvertinimo pagrjstumq re- 
miasi tradiciniu poziuriu. 

Pagrindinis tyrimq trukumas susijqs su jvertinamo 
konstrukto apibudinimu. Nors buvo keietas bandymq 
taikyti apibendrinamumo ir isorinj aspektus vertinant 
studentq atsakymq pagrjstumq, konceptuaiusis, es- 
minis ir pasekmiq pagrjstumo aspektai dar nera tyri- 
neti. Taip pat reiketq issiaiskinti ar jvertinimas objekty- 
viai atspindi destymo kokybq, kaip sprendimq prieme- 
jai ir destytojai pritaiko gautas isvadas profesiniam to- 
bulejimui ar administraciniams sprendimams. 


CONCLUSIONS 

The evaiuation of teaching in higher education has evolved from 
informal to systematic approaches as pressures for accountability 
increased in this level of education. In addition, as administra- 
tors began to worry about measuring outcomes, concerns have 
increased among the faculty about the fairness and use of evalu- 
ation results for making administrative decisions, such as ten- 
ure, promotion and salary increases. 

As the research on teaching and learning have evoived to 
more complex understanding of these the teaching and iearn- 
ing processes, new questions are raised about the validity of the 
traditional positivistic approach for evaiuating teaching in higher 
education. Under this positivistic approach, teaching is iinked to 
the idea of treatment and evaiuation to the idea of effectiveness. 
Good teaching is defined as a set of ideal characteristics or 
behaviors expected from the instructor. Although, some research- 
ers define teaching as a globai construct as opposite to a set of 
characteristics, behaviors or dimensions. Studies under the 
positivistic approach have been conducted with an emphasis 
on measurement issues, iooking especiaiiy to the reliabiiity and 
stability of student ratings of instruction, as weil as on the study 
of severai variables that could negatively influence the evalua- 
tion. Other studies have been conducted about the strengths 
and limitations of different evaiuative sources (i.e. student rat- 
ings, peers, external observers, administrators, self-evaluation, 
etc.), and on the reiationship between student ratings and some 
variables, such as student achievement. Reiiability studies have 
been conducted about the internai consistency of the evaluation 
forms (item analysis), and stabiiity of the ratings over time. 

Supporters of the positivistic approach for evaluating teaching 
ciaim that student rating of instruction are reliable sources for evalu- 
ating teaching. The researchers have found correiations between 
student ratings and student achievement,^ and stability of the rat- 
ings when comparing the ratings of the same instructor in different 
sections. Multi-trait studies have aiso found some evidence of dis- 
criminant and convergent vaiidity of the ratings. In addition, research- 
ers studying the possible variables that could negatively influence 
the ratings have found evidence of some biasing influence, prima- 
riiy by course type (required versus eiective) and course discipline 
as biasing factors influencing the ratings. 

Critics of the positivistic approach state that the strong em- 
phasis on generalization and the estabiishment of causal and 
effect linkages have been overstressed by the research. Studies 
on the dimensionality of the student ratings have failed to iden- 
tify the essential elements of the construct “good teaching.” The 
over reiiance on meta analyses of students on student ratings of 
instruction has aiso been questioned, especiaiiy when evaiua- 
tion data is used for making administrative decisions that can 
affect faculty careers. In addition, prior research on the vaiidity of 
the evaluations of teaching in higher education has been con- 
ducted using a traditional approach to validity. 

But the main iimitation of the research is about the validity of 
the evaiuation in representing the construct being evaiuated. Ai- 
though some research on the validity of student ratings have been 
conducted on the generalizabiiity of the ratings and their external 
validity. No research has been conducted on the conceptuai, sub- 
stantive and consequentiai vaiidity aspects of the evaiuation. There 
is aiso a need for understanding if the evaluation fairly represents 
the quality of teaching within its context, and how decision mak- 
ers and teachers use evaluation resuits for professionai develop- 
ment and for making administrative decisions. 


^ Parodomq pazymiais ^ Defined as grades. 
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